Semi - supervised Learning of Instance - level Recognition from Video

نویسندگان

  • Daniel Rueckert
  • Ben Glocker
  • Ilija Radosavovic
چکیده

Over the recent years, data-driven approaches that take advantage of the availability of large manually-annotated datasets have proven effective at visual perception tasks. We witnessed particularly rapid improvements in the image classification task, where machines now surpass human-level performance. However, the progress in certain instancelevel recognition tasks, such as object segmentation and human pose estimation, has not been as rapid. In part, the rate of improvement has been affected by the difficulty of manually annotating very large amounts of data for these tasks. In this work we explore if semi-supervised learning approaches that utilise unlabelled video data can provide a reasonably effective alternative to the manually annotated data for instance-level recognition. Central to our approaches is the idea that good models should consistently predict the same label for different pose and view variations of the same object instance. We employ a hard example mining heuristic to find video frames in which the model makes mistakes and correct them by combining the information from the remaining video frames. By noting that video can be seen just like a source of transformation, we generalise our approach to unlabelled images and apply it to the human pose estimation task. The resulting technique, which we call keypoint data distillation (KDD), is simple and very effective. Using a collection of unlabelled video frames, we show that the Mask R-CNN model combined with KDD achieves state-of-the-art results on the COCO Keypoint Challenge, outperforming all other entries by a significant margin.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Instance-level Semisupervised Multiple Instance Learning

Multiple instance learning (MIL) is a branch of machine learning that attempts to learn information from bags of instances. Many real-world applications such as localized content-based image retrieval and text categorization can be viewed as MIL problems. In this paper, we propose a new graph-based semi-supervised learning approach for multiple instance learning. By defining an instance-level g...

متن کامل

Learning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning

Patient pain can be detected highly reliably from facial expressions using a set of facial muscle-based action units (AUs) defined by the Facial Action Coding System (FACS). A key characteristic of facial expression of pain is the simultaneous occurrence of pain-related AU combinations, whose automated detection would be highly beneficial for efficient and practical pain monitoring. Existing ge...

متن کامل

Watch and Learn: Semi-Supervised Learning for Object Detectors From Video

We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousands of object instances. We propose criteria for reliable object detection and tracking for constraining the semi-supervised learning process and minimizing semantic drift. Our approach does not assume e...

متن کامل

Multiview Hessian regularized logistic regression for action recognition

With the rapid development of social media sharing, people often need to manage the growing volume of multimedia data such as large scale video classification and annotation, especially to organize those videos containing human activities. Recently, manifold regularized semi-supervised learning (SSL), which explores the intrinsic data probability distribution and then improves the generalizatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017